44 research outputs found

    Batch Size Influence on Performance of Graphic and Tensor Processing Units during Training and Inference Phases

    Full text link
    The impact of the maximally possible batch size (for the better runtime) on performance of graphic processing units (GPU) and tensor processing units (TPU) during training and inference phases is investigated. The numerous runs of the selected deep neural network (DNN) were performed on the standard MNIST and Fashion-MNIST datasets. The significant speedup was obtained even for extremely low-scale usage of Google TPUv2 units (8 cores only) in comparison to the quite powerful GPU NVIDIA Tesla K80 card with the speedup up to 10x for training stage (without taking into account the overheads) and speedup up to 2x for prediction stage (with and without taking into account overheads). The precise speedup values depend on the utilization level of TPUv2 units and increase with the increase of the data volume under processing, but for the datasets used in this work (MNIST and Fashion-MNIST with images of sizes 28x28) the speedup was observed for batch sizes >512 images for training phase and >40 000 images for prediction phase. It should be noted that these results were obtained without detriment to the prediction accuracy and loss that were equal for both GPU and TPU runs up to the 3rd significant digit for MNIST dataset, and up to the 2nd significant digit for Fashion-MNIST dataset.Comment: 10 pages, 7 figures, 2 table

    From Quantity to Quality: Massive Molecular Dynamics Simulation of Nanostructures under Plastic Deformation in Desktop and Service Grid Distributed Computing Infrastructure

    Get PDF
    The distributed computing infrastructure (DCI) on the basis of BOINC and EDGeS-bridge technologies for high-performance distributed computing is used for porting the sequential molecular dynamics (MD) application to its parallel version for DCI with Desktop Grids (DGs) and Service Grids (SGs). The actual metrics of the working DG-SG DCI were measured, and the normal distribution of host performances, and signs of log-normal distributions of other characteristics (CPUs, RAM, and HDD per host) were found. The practical feasibility and high efficiency of the MD simulations on the basis of DG-SG DCI were demonstrated during the experiment with the massive MD simulations for the large quantity of aluminum nanocrystals (∼102\sim10^2-10310^3). Statistical analysis (Kolmogorov-Smirnov test, moment analysis, and bootstrapping analysis) of the defect density distribution over the ensemble of nanocrystals had shown that change of plastic deformation mode is followed by the qualitative change of defect density distribution type over ensemble of nanocrystals. Some limitations (fluctuating performance, unpredictable availability of resources, etc.) of the typical DG-SG DCI were outlined, and some advantages (high efficiency, high speedup, and low cost) were demonstrated. Deploying on DG DCI allows to get new scientific quality\it{quality} from the simulated quantity\it{quantity} of numerous configurations by harnessing sufficient computational power to undertake MD simulations in a wider range of physical parameters (configurations) in a much shorter timeframe.Comment: 13 pages, 11 pages (http://journals.agh.edu.pl/csci/article/view/106

    Performance Evaluation of Distributed Computing Environments with Hadoop and Spark Frameworks

    Full text link
    Recently, due to rapid development of information and communication technologies, the data are created and consumed in the avalanche way. Distributed computing create preconditions for analyzing and processing such Big Data by distributing the computations among a number of compute nodes. In this work, performance of distributed computing environments on the basis of Hadoop and Spark frameworks is estimated for real and virtual versions of clusters. As a test task, we chose the classic use case of word counting in texts of various sizes. It was found that the running times grow very fast with the dataset size and faster than a power function even. As to the real and virtual versions of cluster implementations, this tendency is the similar for both Hadoop and Spark frameworks. Moreover, speedup values decrease significantly with the growth of dataset size, especially for virtual version of cluster configuration. The problem of growing data generated by IoT and multimodal (visual, sound, tactile, neuro and brain-computing, muscle and eye tracking, etc.) interaction channels is presented. In the context of this problem, the current observations as to the running times and speedup on Hadoop and Spark frameworks in real and virtual cluster configurations can be very useful for the proper scaling-up and efficient job management, especially for machine learning and Deep Learning applications, where Big Data are widely present.Comment: 5 pages, 1 table, 2017 IEEE International Young Scientists Forum on Applied Physics and Engineering (YSF-2017) (Lviv, Ukraine

    Performance Analysis of Open Source Machine Learning Frameworks for Various Parameters in Single-Threaded and Multi-Threaded Modes

    Full text link
    The basic features of some of the most versatile and popular open source frameworks for machine learning (TensorFlow, Deep Learning4j, and H2O) are considered and compared. Their comparative analysis was performed and conclusions were made as to the advantages and disadvantages of these platforms. The performance tests for the de facto standard MNIST data set were carried out on H2O framework for deep learning algorithms designed for CPU and GPU platforms for single-threaded and multithreaded modes of operation Also, we present the results of testing neural networks architectures on H2O platform for various activation functions, stopping metrics, and other parameters of machine learning algorithm. It was demonstrated for the use case of MNIST database of handwritten digits in single-threaded mode that blind selection of these parameters can hugely increase (by 2-3 orders) the runtime without the significant increase of precision. This result can have crucial influence for optimization of available and new machine learning methods, especially for image recognition problems.Comment: 15 pages, 11 figures, 4 tables; this paper summarizes the activities which were started recently and described shortly in the previous conference presentations arXiv:1706.02248 and arXiv:1707.04940; it is accepted for Springer book series "Advances in Intelligent Systems and Computing
    corecore